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Abstract 



a 

.2 

' Small peptides are model molecules for the amino acid residues that are the constituents of proteins. In any bottom-up 

approach to understand the properties of these macromolecules essential in the functioning of every living being, to 
correctly describe the conformational behaviour of small peptides constitutes an unavoidable first step. In this work, 
we present an study of several potential energy surfaces (PESs) of the model dipeptide HCO-L-Ala-NH2. The PESs 
K*" are calculated using the B3LYP density-functional theory (DFT) method, with Dunning's basis sets cc-pVDZ, aug- 

cc-pVDZ, cc-pVTZ, aug-cc-pVTZ, and cc-pVQZ. These calculations, whose cost amounts to approximately 10 years 
Cs| of computer time, allow us to study the basis set convergence of the B3LYP method for this model peptide. Also, we 

" i compare the B3LYP PESs to a previous computation at the MP2/6-31 l-i-+G(2df,2pd) level, in order to assess their 

accuracy with respect to a higher level reference. All data sets have been analyzed according to a general framework 
(yQ which can be extended to other complex problems and which captures the nearness concept in the space of model 

chemistries (MCs). 

> 

c3 1 Introduction 



In any bottom-up attempt to understand the behaviour of protein molecules (in particular, the still 
elusive protein folding process [1-5]), the characterization of the conformational preferences of 
short peptides [6-13] constitutes an unavoidable first step. Due to the lower numerical effort re- 
quired and also to the manageability of their conformational space, the most frequently studied 
peptides are the shortest ones: the dipeptides [14-17], in which a single amino acid residue is 
capped at both the N- and C-termini with neutral peptide groups. Among them, the most popular 
choice has been the alanine dipeptide [6, 18-34], which, being the simplest chiral residue, shares 
many similarities with most of the rest of dipeptides for the minimum computational price. 

Although classical force fields [35-43] are the only feasible choice for simulating large molecules 
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at present, they have been reported to yield inaccurate potential energy surfaces (PESs) for dipep- 
tides [29, 44-47] and short peptides [6, 48]. Therefore, it is not surprising that they are widely 
recognized as being unable to correctly describe the intricacies of the whole protein folding pro- 
cess [44, 49-55]. On the other hand, albeit prohibitively demanding in terms of computational 
resources, ab initio quantum mechanical calculations [56-58] are not only regarded as the correct 
physical description that in the long run will be the preferred choice to directly tackle proteins 
(given the exponential growth of computer power and the advances in the search for pleasantly 
scaling algorithms [59, 60]), but they are also used in small peptides as the reference against which 
less accurate methods must be compared [6, 29, 44, 45, 47, 61, 62] in order to, for example, 
parameterize improved generations of additive, classical force fields for polypeptides. 

However, despite the sound theoretical basis, in practical quantum chemistry calculations a 
plethora of approximations must be typically made if one wants to obtain the final results in a 
reasonable human time. The exact 'recipe' that includes all the assumptions and steps needed 
to calculate the relevant observables for any molecular system has been termed model chemistry 
(MC) by John Pople. In his own words, a MC is an "approximate but well-defined general and 
continuous mathematical procedure of simulation" [63]. 

After assuming that the particles involved move at non-relativistic velocities and that the greater 
weight of the nuclei allows to perform the Born-Oppenheimer approximation, we are left with the 
problem of solving the non-relativistic electronic Schrodinger equation [60]. The two starting 
approximations to its exact solution that a MC must contain are, first, the truncation of the A^- 
electron space (in wavefunction-based methods) or the choice of the functional (in DFT) and, 
second, the truncation of the one-electron space, via the LCAO scheme (in both cases). The extent 
up to which the first truncation is carried (or the functional chosen in the case of DFT) is commonly 
called the method and it is denoted by acronyms such as RHP, MP2, B3LYP, CCSD(T), PCI, 
etc., whereas the second truncation is embodied in the definition of a finite set of atom-centered 
Gaussian functions termed basis set [57, 58, 60, 64, 65], which is also designated by conventional 
short names, such as 6-3 1 -i-G(d), TZP or cc-pVTZ(-f). If we denote the method by a capital M and 
the basis set by a B, the specification of both is conventionally denoted by L := M/B and called a 
level of the theory. Typical examples of this are RHP/3-2 10 or MP2/cc-pVDZ [56-58]. 

Note that, apart from these approximations, which are the most commonly used and the only 
ones that are considered in this work, the MC concept may include a lot of additional features: the 
heterolevel approximation (explored in a previous work in this series [34]), protocols for extrapo- 
lating to the infinite-basis set limit [66-70], additivity assumptions [71-74], extrapolations of the 
M0ller- Pies set series to infinite order [75], removal of the so-called basis set superposition error 
(ESSE) [76-82], etc. The reason behind most of these techniques being the urging need to reduce 
the computational cost of the calculations. 

Now, although general applicability is a requirement that all MCs must satisfy, general accu- 
racy is not mandatory. Actually, the fact is that the different procedures that conform a given MC 
are typically parameterized and tested in very particular systems, which are often small molecules. 
Therefore, the validity of the approximations outside that native range of problems must be always 
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questioned and checked. However, while the approximate computational cost of a given MC for 
a particular system is rather easy to predict on the basis of simple scaling relations, its expected 
accuracy on a particular problem could be difficult to predict a priori, specially if we are deal- 
ing with large molecules in which interactions in very different energy scales are playing a role. 
The description of the conformational behaviour of peptides (or, more generally, flexible organic 
species), via their PESs in terms of the soft internal coordinates, is one of such problems and the 
one that is treated in this work. 

To this end, we first describe, in sec. 2, the computational and theoretical methods used through- 
out the rest of the document. Then, in sec. 3, we introduce a basic framework that rationalizes the 
actual process of evaluating the efficiency of any MC for a complex problem. These general ideas 
are used, in sec. 4, to perform an study of the density-functional theory (DFT) B3LYP [83-86] 
method with the cc-pVDZ, aug-cc-pVDZ, cc-pVTZ, aug-cc-pVTZ, and cc-pVQZ Dunning's basis 
sets [87, 88]. To this end, we apply these levels of the theory to the calculation the PES of the 
model dipeptide HC0-L-Ala-NH2 (see fig. 1), and assess their efficiency by comparison with a 
reference PES. Finally, in sec. 5, the most important conclusions are briefly summarized. 



2 Methods 

All ab initio quantum mechanical calculations have been performed using the GAMESS-US pro- 
gram [89, 90] under Linux and on 2.2 GHz PowerPC 970FX machines with 2 GB RAM memory. 

The internal coordinates used for the Z-matrix of the HC0-L-Ala-NH2 dipeptide in the GAMESS- 
US input files are the Systematic Approximately Separable Modular Internal Coordinates (SAS- 
MIC) ones introduced in ref. 91. They are presented in table 1 (see also fig. 1 for reference). 

All PESs in this study have been discretized into a regular 12x12 grid in the bidimensional 
space spanned by the Ramachandran angles (p and i/^, with both of them ranging from -165° to 165° 
in steps of 30°. To calculate the PES at a particular level of the theory, we have run constrained 
energy optimizations at each point of the grid, freezing the two Ramachandran angles (p and i// at the 
corresponding values. In order to save computational resources, the starting structures were taken, 
when possible, from PESs previously optimized at a lower level of the theory. All the basis sets 
used in the study have been taken from the GAMESS-US internally stored library, and spherical 
Gaussian-type orbitals (GTOs) have been preferred, thus having 5 d-type and 7 f-type functions 
per shell. 

We have computed 5 PESs, using the DFT B3LYP [83-86] method with the cc-pVDZ, aug- 
cc-pVDZ, cc-pVTZ, aug-cc-pVTZ, and cc-pVQZ Dunning's basis sets [87, 88]. The total cost of 
these calculations in the machines used is around 10 years of computer time. 

Also, let us note that the correcting terms to the PES coming from mass-metric tensors determi- 
nants and from the determinant of the Hessian matrix have been recently shown to be relevant for 
the conformational behaviour of peptides [18]. (The latter may be regarded as a residual entropy 
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Hi 








C2 


(2,1) 






N3 


(3,2) 


(3,2,1) 




O4 


(4,2) 


(4,2,1) 


(4,2,1,3) 


C5 


(5,3) 


(5,3,2) 


(5,3,2,1) 


He 


(6,3) 


(6,3,2) 


(6,3,2,5) 


C7 


(7,5) 


(7,5,3) 


: =(7,5,3,2) 




(8,5) 


(8,5,3) 


(8,5,3,7) 


H9 


(9,5) 


(9,5,3) 


(9,5,3,7) 


Hio 


(10,8) 


(10,8,5) 


(10,8,5,3) 


Hii 


(11,8) 


(11,8,5) 


(11,8,5,10) 


H12 


(12,8) 


(12,8,5) 


(12,8,5,10) 


Nl3 


(13,7) 


(13,7,5) 


«A :=(13,7,5,3) 


Oi4 


(14,7) 


(14,7,5) 


(14,7,5,13) 


Hl5 


(15,13) 


(15,13,7) 


(15,13,7,5) 


H16 


(16,13) 


(16,13,7) 


(16,13,7,15) 



Table 1 : Internal coordinates in Z-matrix form of the protected dipeptide HCO-L-Ala-NHi according to 
the SASMIC scheme introduced in ref. 91. The numbering of the atoms is that in fig. 1, and the soft 
Ramachandran angles ^ and ip are indicated. 



H10 ;t'ii 




Figure 1 : Atom numeration of the protected dipeptide HC0-L-Ala-NH2 according to the SASMIC scheme 
introduced in ref. 91. The soft Ramachandran angles and are also indicated. 

arising from the elimination of the hard coordinates from the description.) Although, in this study, 
we have included none of these terms, the PES calculated here is the greatest part of the effective 
free energy [18], so that it may be considered as the first ingredient for a further refinement of the 
study in which the correcting terms are taken into account. The same may be said about another 
important source of error in the calculation of relatives energies in peptide systems: the already 
mentioned BSSE [31]. 



In order to compare the PESs produced by the different MCs, a statistical criterium (distance) 
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introduced in ref. 92 has been used. Let us recall here that this distance, denoted by dn, profits 
from the complex nature of the problem studied to compare any two different potential energy 
functions, Vi and V2. From a working set of conformations (in this case, the 144 points of each 
PES), it statistically measures the typical error that one makes in the energy differences if V2 is used 
instead of the more accurate Vi, admitting a linear rescaling and a shift in the energy reference. 

Despite having energy units, the quantity du approximately presents all properties characteris- 
tic of a typical mathematical metric in the space of MCs (hence the word 'distance'), such as the 
possibility of defining a symmetric version of it and a fulfillment of the triangle inequality (see 
ref. 92 for the technical details and sec. 3 for more about the importance of these facts). It also 
presents better properties than other quantities customarily used to perform these comparisons, 
such as the energy RMSD, the average energy error, etc., and it may be related to the Pearson's 
correlation coefficient by 

du= yl2cr2il-rl^y'\ (1) 
where (T2 is the standard deviation of V2 in the working set. 

Moreover, due to its physical meaning, it has been argued in ref. 92 that, if the distance between 
two different approximations of the energy of the same system is less than RT, one may safely 
substitute one by the other without altering the relevant dynamical or thermodynamic al behaviour. 
Consequently, we shall present the results in units of RT (at 300° K, so that RT ^ 0.6 kcal/mol). 

Finally, if one assumes that the effective energies compared will be used to construct a polypep- 
tide potential and that it will be designed as simply the sum of mono-residue ones (more complex 
situations may be found in real problems [93]), then, the number A^res of residues up to which one 
may go keeping the distance dn between the two approximations of the the A^-residue potential 
below RT is [92] 



According to the value taken by A'^res for a comparison between a fixed reference PES, de- 
noted by Vi, and a candidate approximation, denoted by V2, we shall divide the whole accuracy 
range in sec. 4 in three regions depending on the accuracy: the protein region, corresponding 
to < di2 < O.IRT, or, equivalently, to 100 < A^res < 00; the peptide region, corresponding to 
OART < d\2 < RT, or 1 < A^res < 100; and, finally, the inaccurate region, where du > RT, and 
even for a dipeptide it is not advisable to use V2 as an approximation to Vi . Of course, these are 
only approximate regions based on the general idea that we are not interested on the dipeptides 
as a final system, but only as a mean to approach protein behaviour from the botton-up. There- 
fore, not only the error in the dipeptides must be measured, but it must also be estimated how this 
discrepancy propagates to polypeptide systems. 
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3 General framework 

The general abstract framework behind the investigation presented in this study (and also implicitly 
behind most of the works found in the literature), may be described as follows: 

The objects of study are the model chemistries defined by Pople [63] and discussed in the in- 
troduction. The MCs under scrutiny are applied to a particular problem of interest, which may 
be thought to be formed by three ingredients: the physical system, the relevant observables and 
the target accuracy. The MCs are then selected according to their ability to yield numerical val- 
ues of the relevant observables for the physical system studied within the target accuracy. The 
concrete numerical values that one wants to approach are those given by the exact model chem- 
istry MCe, which could be thought to be either the experimental data or the exact solution of the 
non-relativistic electronic Schrodinger equation [60]. However, the computational effort needed to 
perform the calculations required by MCs is literally infinite, so that, in practice, one is forced to 
work with a reference model chemistry MC'^*, which, albeit different from MCe, is thought to be 
close to it. Finally, the set of MCs that one wants to investigate are compared to MC^*^' and the 
nearness to it is seen as approximating the nearness to MCg. 

These comparisons are commonly performed using a numerical quantity D that is a function 
of the relevant observables. In order for the intuitive ideas about relative proximity in the M space 
to be captured and the above reasoning to be meaningful, this numerical quantity D must have 
some of the properties of a mathematical distance. In particular, it is advisable that the triangle 




Figure 2: Space M of all model chemistries. The exact model chemistry MC^ is shown as a black circle, 
the MP2 reference MC is shown as a grey-filled circle, and B3LYP MCs as white-filled ones. Both reference 
PESs aie indicated with an additional circle around the points. The situation depicted is (schematically) the 
one found in this study, assuming that MP2 is a more accurate method than B3LYP to account for the 
conformational preferences of peptide systems. The positions of the different MCs have no relevance, and 
only the relative measured distances among them are qualitatively depicted. 
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inequality is obeyed, so that, for any model chemistry MC, one has that 

£)(MCe, MC) < D{MC^, MC^^) + D{MC^\ MC) , (3a) 
£)(MCe, MC) > \D(MC^, MC'*') - D(MC^^, MC)| , (3b) 

and, assuming that £)(MCe, MC''*) is small (and D is a positive function), we obtain 

D(MC^, MC) ^ D(MC'''*, MC) , (4) 

which is the sought result in agreement with the ideas stated at the beginning of this section. 

The distance dn introduced in ref. 92 and summarized in the previous section, measured in this 
case on the conformational energy surfaces (the relevant observable) of the model dipeptide HCO- 
L-Ala-NH2 (the physical system), approximately fulfills the triangle inequality and thus captures 
the nearness concept in the space M of model chemistries. 

This space, M, containing all possible MCs, is a rather complex and multidimensional one. 
For example, two commonly used 'dimensions' which may be thought to parameterize M are the 
size of the basis set and the amount of electron correlation in the model (or the quality of the 
DFT functional used). However, since there are many ways in which the size of a basis set or the 
electron correlation may be increased and there are additional approximations that can be included 
in the MC definition (see sec. 1), the 'dimensions' of M can be considered to be many more than 
two. 

The definition of a distance, such as the one described in the previous lines, for a given problem 
of interest helps to provide a certain degree of structure into this complex space. In fig. 2 a two- 
dimensional scheme of the overall situation found in this study is presented. 

4 Results 

Before starting with the results of the calculations, let us introduce the concept of efficiency of a 
particular MC that shall be used: It is laxly defined as a balance between accuracy (in terms of 



MCs 


du/RT " 


au 


M <: 
^ 'res 




B3LYP/aug-cc-pVTZ 


0.079 


15.2 


159.8 


79.09% 


B3LYP/cc-pVTZ 


0.191 


21.1 


27.4 


9.78% 


B3LYP/aug-cc-pVDZ 


0.172 


82.8 


33.7 


5.27% 


B3LYP/cc-pVDZ 


1.045 


109.4 


0.9 


1.29% 



Table 2: Basis set convergence results for the B3LYP MCs investigated in this work. "^Distance with the 
B3LYP/cc-pVQZ reference in units of RT at 300° K. '^Energy offset with the reference MC in kcal/mol. 
'^Maximum number of residues in a polypeptide potential up to which the corresponding MC may correctly 
approximate the reference (under the assumptions in sec. 2). '^Required computer time, expressed as a 
fraction of ?ref ■ 
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0.2 0.4 0.6 0.8 1 1.2 

Figure 3: Efficiency plot of all the B3LYP MCs studied. In the jc-axis, we show the distance di2, in units 
of RT at 300° K, between any given MC and the B3LYP/cc-pVQZ reference (indicated by an encircled 
point), while, in the y-axis, we present the computer time needed to compute the whole 12x12 grid in the 
Ramachandran space of the model dipeptide HC0-L-Ala-NH2. The different accuracy regions are labeled, 
and the 10% of the time fbest taken by the reference MC is also indicated. 



the distance introduced in sec. 2) and computational cost (in terms of computer time). It can be 
graphically extracted from the efficiency plots, where the distance dn between any given MC and a 
reference one is shown in units of RT in the x-axis, while, in the j-axis, one can find the computer 
time taken for each MC (see the following pages for two examples). As a general thumb-rule, we 
shall consider a MC to be more efficient for approximating the reference when it is placed closer 
to the origin of coordinates in the efficiency plot. This approach is intentionally non-rigorous 
due to the fact that many factors exist that influence the computer time but may vary from one 
practical calculation to another; such as the algorithms used, the actual details of the computers 
(frequency of the processor, size of the RAM and cache memories, system bus and disk access 
velocity, operating system, mathematical libraries, etc.), the starting guesses for the SCF orbitals 
or the starting structures in geometry optimizations. 

Taking all this into account, the only conclusions that shall be drawn in this work about the 
relative efficiency of the MCs studied are those deduced from strong signals in the plots and, 
therefore, those that can be extrapolated to future calculations; in other words, the small details 
shall be typically neglected. 

In the first part of the study, we compare all B3LYP MCs to the one with the largest basis set, 
B3LYP/cc-pVQZ (the highest level of the theory calculated for this work, depicted in fig. 4) using 
the distance introduced in sec. 2. All mentions to the accuracy of any given MC in this part must 
be understood as relative to this reference. However, it has been reported that MP2 is a superior 
method to B3LYP to account for the conformational behaviour of peptide systems [94]. Therefore, 
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Figure 4: Potential energy surface of tlie model dipeptide HC0-L-Ala-NH2 computed at the B3LYP/cc- 
pVQZ level of the theory. The PES has been originally calculated in a 12x12 discrete grid in the space 
spanned by the Ramachandran angles (f> and ifr and later smoothed with bicubic splines for visual conve- 
nience. The energy reference has been set to zero. (At this level of the theory, the absolute energy of the 
minimum point in the 12x12 grid, located at (-75", 75°), is -417.199231353 hartree). 



the absolute accuracy of the B3LYP MCs calculated here is probably closer to the relative accuracy 
with respect to the MP2/6-311-l-+G(2df,2pd) reference in what follows. In this spirit, this part of 
the study should be regarded as an investigation of the convergence to the infinite basis set B3LYP 
limit, for which the best B3LYP MC here is probably a good approximation. 

The results are depicted in fig. 3, and in table 2. We can extract several conclusions from them: 

• Regarding the convergence to the infinite basis set limit, we observe that only the most 
expensive MC, B3LYP/aug-cc-pVTZ, correctly approximates the reference for peptides of 
more than 100 residues. On the other hand, for only 5.27% of the computer time ?ref taken 
by the reference MC, we can use B3LYP/aug-cc-pVDZ, which correctly approximates it up 
to 30-residue peptides. Finally, the MC with the smallest basis set, B3LYP/cc-pVDZ cannot 
properly replace the reference even in dipeptides. 

• In ref. [34], using Pople's basis sets [95-102], we saw that "the general rule that is some- 
times assumed when performing quantum chemical calculations, which states that 'the more 
expensive, the more accurate', is rather coarse-grained and relevant deviations from it may 
be found." We recognized that "One may argue that this observation is due to the unsystem- 
atic way in which Pople basis sets can be enlarged and that the correlation between accuracy 
and cost will be much higher if, for example, only Dunning basis sets are used.", which is 
definitely observed in fig. 3, but we argued that this was something to be expected, since 
"there are two few Dunning basis sets below a reasonable upper bound on the number of 
elements to see anything but a line in the efficiency plot". In the results presented in this 
work, we can see that, even if the correlation between accuracy and cost is higher in the case 
of Dunning's basis sets than in the case of Pople's, due to the smaller number of the former. 
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ML-S 




o-n 


^ 'res 


t 


B3LYP/cc-pVQZ 


1.008 


-457.2 


0.98 


1861 


B3LYP/aug-cc-pVTZ 


1.029 


-442.0 


0.94 


1472 


B3LYP/cc-pVTZ 


1.058 


-436.1 


0.89 


182 


B3LYP/aug-cc-pVDZ 


1.006 


-374.4 


0.99 


98 


B3LYP/cc-pVDZ 


1.533 


-347.8 


0.43 


24 



Table 3: Comparison of all the B3LYP MCs investigated in this work with the MP2/6-3 ll-i-FG(2df,2pd) in 
ref. 34. "Distance with the MP2/6-31 1++G(2df,2pd) reference in units oiRT at 300° K. ^Energy offset with 
the reference MC in kcal/mol. '^Maximum number of residues in a polypeptide potential up to which the 
corresponding MC may correctly approximate the reference (under the assumptions in sec. 2). '^Computer 
time needed for the calculation of the whole PES, in days. 

we can still observe that the thumb-rule 'the more expensive, the more accurate' breaks also 
in this case, since the B3LYP/aug-cc-pVDZ MC is, at the same time, more accurate and less 
costly than B3LYP/cc-pVTZ. In general, this idea applies to all the approximations that a 
MC may contain (see the introduction for a partial list), and justifies the systematic search 
for the most efficient combination of them for a given problem. This work is our second step 
(ref. [34] is the first one) in that path for the particular case of the conformational behaviour 
of peptide systems. 

• The observation in the previous point also suggests that it may be efficient to include diffuse 
functions (the 'aug-' in aug-cc-pVDZ) in the basis set for this type of problems. 

• The error of the studied MCs regarding the differences of energy (as measured by dn) is 
much smaller than the error in the absolute energies (as measured by ai2), suggesting that 
the largest part of the discrepancy must be a systematic one. 

In the second part of the study, we assess the absolute accuracy of the B3LYP MCs by com- 
paring them to the (as far as we are aware) highest homolevel in the literature, the MP2/6-31 1-I--I- 
G(2df,2pd) PES in ref. [34]. If one assumes that this level of the theory may be close enough to the 
exact result for the given problem at hand, then this comparison measures the 'absolute' accuracy 
of the B3LYP MCs, and not only their relative accuracy with respect to the B3LYP infinite basis 
set limit, as we did in the previous part. This is the fundamental difference between figs. 3 and 5. 

The results of this part of the study are depicted in fig. 5, and in table 3. We can extract several 
conclusions from them: 

• All B3LYP MCs, including the largest one, B3LYP/cc-pVQZ, lie in the inaccurate region 
of the efficiency plot in fig. 5, meaning that they cannot be reliably used to approximate the 
MP2/6-31 l-i--i-G(2df,2pd) reference even in the smallest dipeptides. 



B3LYP basis set convergence in peptides : Echenique and Chass 



11 



10000 



100 



10 







B3LYP/cc-pVQZ 






» 






* B3LYP/aug-cc-pVTZ 






• 

B3LYP/cc-pVTZ 


■ o 

OJ 




" B3LYP/aiig-cc-pVDZ 


Otein 1 




B3LYP/cc-pVDZ 




Peptide region 


• 

1 



0.5 1 1.5 2 



Figure 5: Efficiency piot of aii ttie B3LYP MCs studied. In ttie x-axis, we sliow ttie distance dn, in 
units of RT at 300° K, between any given MC and ttie MP2/6-31f++G(2df,2pd) reference calculated in 
ref. 34, wiiile, in tlie y-axis, we present tfie computer time needed to compute tlie wtioie 12x12 grid in tlie 
Ramaciiandran space of tiie modei dipeptide HC0-L-AIa-NH2. Tfie different accuracy regions are labeled 



• Related with the observations in the previous part of the study, we see that there is no point, 
if one is worried about absolute accuracy, in going beyond the aug-cc-pVDZ basis set in 
B3LYP. 

• The B3LYP/cc-pVDZ MC again performs significantly worse than the rest, agreeing with 
the results in the previous part of the study, and suggesting that cc-pVDZ may be a too small 
basis set for the problem tackled here. 

• Again, the error of the MCs in the differences of energy (as measured by dij) is much smaller 
than the error in the absolute energies (as measured by a 12). 



5 Conclusions 

In this study, we have investigated 5 PESs of the model dipeptide HC0-L-Ala-NH2, calculated 
with the B3LYP method, and the cc-pVDZ, aug-cc-pVDZ, cc-pVTZ, aug-cc-pVTZ, and cc-pVQZ 
Dunning's basis sets. We have first assessed the convergence of the B3LYP MCs to the infinite 
basis set limit, and then we have evaluated their absolute accuracy by comparing them to the 
(as far as we are aware) highest homolevel in the literature, the MP2/6-311++G(2df,2pd) PES 
in ref. [34]. All the comparisons have been performed according to a general framework which is 
extensible to further studies, and using a distance between the different PESs that correctly captures 
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the nearness concept in the space of MCs. The calculations performed here have taken around 10 
years of computer time. 

The main conclusions of the study are the following: 

• The complexity of the problem (the conformational behaviour of peptides) renders the cor- 
relation between accuracy and computational cost of the different quantum mechanical al- 
gorithms imperfect. This ultimately justifies the need for systematic studies, such as the one 
presented here, in which the most efficient MCs are sought for the particular problem of 
interest. 

• Assuming that the MP2/6-311-i-+G(2df,2pd) level of the theory is closer to the exact solu- 
tion of the non-relativistic electronic Schrodinger equation than B3LYP/cc-pVQZ, B3LYP 
is not a reliable method to study the conformational behaviour of peptides. Even if, as we 
emphasize at the end of this section, it may be dangerous to state that a method that performs 
well in the particular model of an alanine residue studied here will also be recommendable 
for longer and more complex peptides, we can clearly reject any method that already fails in 
HCO-L-Ala-NH2. 

• If B3LYP is still needed to be used, due to, for example, computational constraints, aug-cc- 
pVDZ represents a good compromise between accuracy and cost. 

• The error of the studied MCs regarding the difi'erences of energy (as measured by d\2) is 
much smaller than the error in the absolute energies (as measured by a 12), suggesting that 
the largest part of the discrepancy must be a systematic one. 

Finally, let us stress again that the investigation performed here have used one of the simplest 
dipeptides. The fact that we have treated it as an isolated system, the small size of its side chain 
and also its aliphatic character, all play a role in the results obtained. Hence, for bulkier residues 

included in polypeptides, and, specially for those that contain aromatic groups, those that are 
charged or may participate in hydrogen-bonds, the methods that have proved to be efficient here 
must be re-tested and the conclusions drawn about the B3LYP convergence to the infinite basis set 
Umit, as well as those regarding the comparison between B3LYP and MP2, should be re-evaluated. 
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